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> ■ ABSTRACT 

Q\ | We use principal component analysis (PCA) to estimate stellar masses, mean stellar 

ages, star formation histories (SFHs), dust extinctions and stellar velocity dispersions 
C ■ ' for a set of ~ 290, 000 galaxies with stellar masses greater than 10 u Mq and redshifts 

in the range 0.4 < z < 0.7 from the Baryon Oscillation Spectroscopic Survey (BOSS). 
QO ■ We find that the fraction of galaxies with active star formation first declines with 

f^i increasing stellar mass, but then flattens above a stellar mass of 1O 11 ' 5 M0 at z ~ 0.6. 

This is in striking contrast to z ~ 0.1, where the fraction of galaxies with active 
star formation declines monotonically with stellar mass. At stellar masses of 1O 12 M0, 
therefore, the evolution in the fraction of star-forming galaxies from z ~ 0.6 to the 
present-day reaches a factor of ~ 10. When we stack the spectra of the most massive, 
' star-forming galaxies at z ~ 0.6, we find that half of their [O in] emission is produced 

J-j \ by AGNs. The black holes in these galaxies are accreting on average at ~ 0.01 the 

E ddington rate. To obtain th ese results, we use the stellar population synthesis models 
of lBruzual fc Chariot! (|2003f ) to generate a library of model spectra with a broad range 
of SFHs, metallicities, dust extinctions and stellar velocity dispersions. The PCA is 
run on this library to identify its principal components over the rest-frame wavelength 
range 3700 — 5500A. We demonstrate that linear combinations of these components 
can recover information equivalent to traditional spectral indices such as the 4000A 
break strength and US a, with greatly improved signal-to-noise ratio. In addition, the 
method is able to recover physical parameters such as stellar mass-to-light ratio, mean 
stellar age, velocity dispersion and dust extinction from the relatively low S/N BOSS 
spectra. We examine in detail the sensitivity of our stellar mass estimates to the input 
parameters in our model library, showing that almost all changes result in systematic 
differences in log Af* of 0.1 dex or less. The biggest differences are obta ined when using 
differe nt population synthesis models - stellar masses derived using iMaraston et alj 
(|201lh models are systematically smaller by up to 0.12 dex at young ages. 
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1 INTRODUCTION 

The mass of a galaxy is perhaps its most fundamental phys- 
ical property. Many gala xy properties, such as metallic- 



ity llTremonti et al. 20041) and recent star formation his- 
tory ilSalim et al l 120051 71 Juneau et al. 1 120051 ; iHeavens et ail 
|2004| ; Kauffmann et all [2003a ), exhibit tight correlations 
with stellar mass. Others , such as the a— en hancement 
IIThomas et al ] 12004 120051 : iGallazzi etahl [2006) correlate 
best with stellar velocity dispersion, which is a measure 
of the mass contained in the bulge-dominated region of 
the galaxy. The baryonic mass of a galaxy is correlated 
with the mass of t he dark matter halo i n which most 
of it s stars formed ( Kauffm ann et all 1997; iBenson et ajj 
2000 |; lKauffmann et al.lll999l ; |Pearce et al.ll200ll ; IWang etahl 
20061 ). Finally, studies of how the scaling between galaxy 



mass and other physical properties evolve with redshift place 
important constraints on the mass assembly histories of 
galaxies, as well as on the processes that regulate star for- 
mation and feed back in these systems (|Chen et al.l l20ld ; 
IWang et al.ll2007h . 

Galaxy masses are traditionally estimated in two ways: 
(i) from the motions of stars and/or gas. This method mea- 
sures the total mass contained interior to the radius where 
one measures these motions, and will include not only the 
the directly observed material, but also the dark matter 
present in the galaxy; (ii) from estimates of stellar mass-to- 
light (M*/L) ratio based on fits of multi-band photometry 
to a grid of composite stellar population models. In the last 
decade, the efficiency with which multi-band photometry 
has been collected from ground- and space-based observato- 
ries has greatly increased, enabling stellar mass-to-light ra- 
tios to be estimated for large samples of galaxies. If spectra 
are available, the information carried by stellar absorption 
lines enables the recent star form ation history of a galaxy 
to be more accurately determined dKauffmann et alj feoOSa: 
IGallazzi fc Belli 12003 : IPanter et al] 120041 ). Constraints on 
M*/L are also improved with spectroscopic information, 
but in practice the stellar masses estimated from multi- 
band photometry and spectral indices for low-redshift galax- 
ies with luminosities ~ L« are consistent with less than 
~0.1 dex scatter (e.g.. iBell et all 120031 : iDrorv et all 120041 : 
ISalim et al.1 120051 : iBorch et alJ 120061 ). The stellar masses 
so obtained correlate strongly with dynamically-measured 



masses (e.g.. IBell fc de Jong||200ll : Ivan der Wei et al.ll2005l : 



ICappellari et al.ll2006l ) 

The agreement between different methods for estimat- 
ing stellar mass has led to a certain amount of complacency 
in the community (see Conroy et al. 2009 for a review). 
It is important to be aware of the following: (^Uncertain- 
ties in the inputs to the stellar population synthesis codes 
used to generate the model galaxy grids a re a source of sys - 
tematic error in stellar mass estimation. iMarastonl l|2005l ) 
has reported that thermally pulsing asymptotic giant branch 
(TP-AGB) stars contribute significantly to the near-infrared 
light of galaxies. Because the physics driving the pulsations 
is poorly understood, this phase of stellar evolution may not 
be very well represented in many current models. (2) At high 
redshifts (z ~ 1), one often lacks access to rest-frame near- 
infrared data, so star formation histories must be estimated 
using the shape of the spectral energy distribution (SED) in 
the UV/optical. This limitation can lead to systematic off- 



sets between the stellar masses derived for galaxy samples 
with different redshifts, even if the true SEDs are the same 
(|Kannappan fc Gaw iscr 2007). (3) If the model galaxies do 
not provide a correct representation of the star formation 
histories of the galaxies in the sample or of the transmis- 
sion of the starlight to the observer, this will also lead to 
errors; for example, dusty galaxies and galaxies that have 
experienced recent starbursts have poorly-measured masses 
if the model library includes only galaxies with smooth star 
formation histories and no dust. (4) Robust stellar masses 
cannot be estimated if the S/N of the data is poor. This 
last point is perhaps an obvious one, but it is important to 
remember that quoted errors on stellar mass estimates are 
as important as the estimates themselves. 

In this paper, we present a method based on a pri n- 
cipal compone nt analysis (PCA; [Madgwic k et al.l l2003al lbl: 
Lu et al. 2006). to estimate galaxy physical parameters from 
rest-frame optical galaxy spectra using stellar population 
synthesis models. These parameters include stellar masses, 
metallicities, dust extinction, velocity dispersions, and esti- 
mates of the recent star formation histories of galaxies such 
as luminosity-weighted ages and the recent fraction of the 
stellar mass formed in bursts. Unlike much previous work, 
which relied on narrow-band indices defined in the vicinity 
of a limited set of stellar absorption lines to estimate these 
parameters, our method employs all the information con- 
tained in the rest-frame wavelength range of the spectrum 
between 3700 and 5500A. Because our method makes use of 
the full spectrum, it can be applied to lower S/N data. In 
addition, the chosen wavelength range is accessible out to 
z ~ 0.8, even in optical spectra; this means the analysis can 
be applied to both low- and high-redshift galaxy samples in 
a consistent manner. 

We apply our method to a set of spectr a from the Sloan 
Digit al Sky Survey Data Release 7 (DR7; [Abazaiia n et al.l 
2009), as well as a new sample of 290,000 spectra of lumi- 
nous galaxies f rom the Baryonic Oscillation Spectroscopi c 
Survey (BOSS; lEisenstem" et al.ll201ll : ISchlegel et aill2009l ). 
We present spectrum-based stellar mass estimates for these 
galaxies, as well as their recent star formation histories, and 
use this information to assess how the recent star formation 
histories of the most massive galaxies in the Universe have 
evolved from z ~ 0.55 to the present day. 

Our paper is arranged as follows. In §2, we introduce 
the two data sets. Our methodology for estimating stellar 
mass and recent SFH is developed in §3. The improvements 
of this new method are discussed in §4. We compare our 
stellar masses with those from multi-band photometry in §5. 
We also discuss systematic effects in our estimates of stellar 
mass. Our results on the evolution of massive galaxies are 
presented in §6. We use the cosmological parameters Ho = 
70 km s _1 Mpc -1 , Qm = 0.3 and Q.a = 0.7 throughout this 
paper. 



2 THE SDSS DATA 
2.1 SDSS DR7 



The Sloan Digital Sky Survey (SDSS: lYork et al.ll2000D ob- 
tained photometry of nearly a quarter of the sky and spectra 
of about one million objects. A drift-scanning mosaic CCD 
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camera l|Gunn et al.lll998h mounted on the SPSS 2.5m tele- 
scope at Apache Point Observatory jGunn et al. 20061) im - 
aged the sky in the u,g,r,i,z bands (IFukugita et al. 1996). 

alj_ 



The imaging data are astrometrical l v dPier et al.l 2003) 
and p hotometrically |Hogg et al.ll200ll ; IPadmanabhan et al.l 
|200ST ) calibrated and used to select stars, galaxies, and 

quasars for follow-up fiber spe ctroscopy. 

The seventh data release l|Abazaiian et al]|2009l ) of the 
SDSS includes ~930,000 galaxy spectra. The spectra have a 
wavelength coverage of 3800 — 9200A and are taken through 
3" diameter fibers. The instrumental resolution is R ~ 2000 
and the dispersion is AlogA = 10~ 4 , where A is the wave- 
length in Angstroms. The median S/N per pixel of the spec- 
tra ranges from 4 to 30, with a median of 14. In this pa- 
per, we m ake use of the spect ra of galaxies from the "Main 
Sample" l|Strauss et al.l [2002) . which have Petrosian mag- 
nitudes in the range 14.5 < r < 17.6 after correction for 
foreground galactic e xtinction using the reddening maps of 
ISchlegel et all (|l998l ). The redshift range spanned by these 
galaxies is 0.01 to 0.30 (see Figure [TJ. 

Stellar masses and star formation rate for the Main 
Sample have been publicly available since 2008 in the 
MPA/JHU catalog. The stellar masses are estimated from 
broad-band u, g, r, i, z photometry ("model" fluxes are used, 
see Stoughton et al. (2002) for definitions of the magni- 
tudes). The total fluxes are corrected for emission line con- 
tribution by assuming that the relative contribution of emis- 
sion lines to the broad-band magnitudes is the same in- 
side the fiber as outside. A large grid of model star for- 
mation hi stories is generated follow ing the methodology de- 
scribed in iKauffmann et al.l l|2003al ). and comparison of the 
observed colors with predictions from these models allows 
one to derive maximum likelihood estimates of the z-band 
stellar mass-to-light ratios (M*/L z ) of the galaxies. Stellar 
masses are computed by multiplying M*/L z by the z-band 
"model" luminosity L z . Masses derived from the 5-band 
photometry are consistent with previous stellar masses de- 
rived using spectral features (D4000 & H<5a in the case of 
Kauffman et al. 2003a and a total of five indices in the case 
of Gallazzi et al. 2005) with an r.m.s scatter of 0.1 dex in 
log M« . The SFRs are d erived from nebular emission lines 
l|Brinchmann et al.|[2004l ). 

The large wavelength coverage and high S/N of the DR7 
spectra, and the fact that robust stellar mass estimates are 
publicly available, means that the DR7 main sample (here- 
after we refer this sample as the DR7 sample for conve- 
nience) is very well-suited for developing and testing our 
new method of physical parameter estimation (see §5). 



2.2 Baryon Oscillation Spectroscopic Survey 

The SDSS-III project has completed an additional 3000 deg 2 
of imaging in the southern Galac tic cap in a manne r identi- 
cal to the original SDSS imaging (|Aihara et al.ll201ll ). BOSS 
is obtaining spectra of a selected subset of 1.5 million lumi- 
nous galaxies to z ~ 0.7. (N. Padmanabhan et al. 2011, in 



1 The MPA/JHU stellar mass catalog can be downloaded from 
http://www.mpa-garching.mpg.de/SDSS/DR7 and is available 
through the SDSS Catalog Archive Server as described in 
lAihara et ail ll201ll) . 
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Figure 1. Black: the redshift distribution of the DR7 sample; 
red: the redshift distribution of the"CMASS" sample of BOSS 
high-z luminous galaxies. The two vertical dashed lines indicate 
the redshift limits of the "CMASS" sample used in our analysis. 



preparation) . The spectrographs have been significantly up- 
graded from those used by SDSS-I/II, with improved CCDs 
with better red response, high throughput gratings, and 
an increased number of fibers (1000 instead of 640). The 
new fibers are 2" in diameter and the spectra cover the 
wavelength range 3600-10,000A, at a resolution of about 
2000. The BOSS galaxy spectra have median S/N per pixel 
(AlogA = 10~ 4 ) of ~2.5. 

The details of how targets are selected from the SDSS 
photometry will be described in detail in Padmanabhan et 
al. (in preparation). The sample we analyze here is the 
"CMASS" sample (so-named, because it is very approxi- 
mately stellar-mass limited). The sample is defined using 
the following cuts: 



d± > 0.55 and 17.5 < i < 19.9 and i flbc r2 < 21.5 



i < 19.86 + 1.61 



0.8) and 



i < 2 



(1) 



where d±_ is a "rotated" combination of colors defined as 
d± — (r — i) — (g — r)/8. ifibcr2 is the magnitude of the 
galaxy measured within a 2" diameter aperture; this is the 
amount of light that enters the fiber. We note that all 
color cuts are defined using "model" magnitudes, whereas 
magnitude limits are given in terms of "cmodel" magni- 
tudes. Two additional cuts are introduced to reduce con- 
tamination by stars; z pB f — 2 mo( jci ^ 9.125 — 0.46z mo dei and 
ipsf - imodei > 0.2 + 0.2 x (20.0 - imodei) ( "psf " refers to 
the psf Mag quantity in the SDSS database). Redshifts are 
successfully determined for 95.3% of CMASS galaxies with 
no apparent dependence on galaxy type. 

These cuts are designed to select massive galaxies with 
z > 0.45. The constraint that d± > 0.55 identifies galaxies 
that lie at high enough redshift so that the 4000 A break 
has shifted beyond the observer frame r-band, leading to red 
r — i colors. The cut on the i-band magnitude of the galaxy 
is designed to produce a sample that is roughly complete 
down to a limiting stellar mass of logM* ~ 11.2 (we will 
come back to this point later). The analysis presented in 
this paper includes the ~ 290, 000 CMASS spectra obtained 
prior to July 2011 which will be made publicly available in 
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SDSS Data Release 9. We restrict our analysis to galaxies 
in the redshift range 0.4 < z < 0.7 (see Figure [!}. 



3 THE METHOD 

Because the CM ASS galaxies are faint, the errors on the 
broad-band colors are large; this is especially true of colors 
that include the -u-band. The median magnitude errors are 
0.62, 0.16, 0.06, 0.04, 0.09 for u, g, r, i, z-band measurements, 
respectively. In this section, we describe a new method for 
estimating galaxy properties, including stellar masses and 
recent SFHs, using the SDSS spectra rather than photome- 
try. As we will show, this method can be applied successfully 
to low S/N data, such as that from the BOSS survey. 

The main concept underlying our approach is the fact 
that a galaxy spectrum can be described as a number of 
orthogonal principal components (PCs). When one applies 
a PCA decomposition to model galaxy spectra, one finds 
that the PCs can be related to the physical properties of 
galaxies, such as their stellar mass-to-light ratios, their re- 
cent SFHs, the average age of their stellar populations, and 
their velocity dispersions. By decomposing each observed 
spectrum into the same set of PCs, and by comparing the 
amplitudes of the components with those calculated for the 
model galaxies, we obtain the likelihood distribution of a 
given parameter P in the space of the values of P allowed 
by the models. In this work, we calculate likelihood distri- 
butions by comparing the amplitudes of the first seven PCs 
(§3.4.1 clarifies this choice) to those calculated for the galax- 
ies in the model library. In the following sections, we provide 
a step-by-step description of our method. 



3.1 The spectral range used in the analysis 

We follow two criteria to select the rest-frame spectral range: 
(1) it should be as wide as possible so as to make optimum 
use of all the information carried in the spectrum; (2) to 
minimize systematic effects, the same spectral range should 
be used in the analysis of both the low redshift and the high 
redshift galaxies. Considering the redshift and wavelength 
coverage of DR7 (0.005 < z < 0.3, 3800 < A < 9200A) and 
of BOSS (0.4 < z < 0.7, 3600 < A < 10, 000A) galaxy sam- 
ples, we choose rest-frame 3700A to 5500A for the current 
analysis. Spectral features located in this region, including 
the 4000A break strength (D4000) and Balmer absorption 
lines, provide important information about the stellar pop- 
ulations and recent SFHs of the galaxies. 

Single stellar population (SSP) models that are well- 
matched in spectral resolution (e.g. from Bruzual & Chariot 
2003, hereafter BC03 and from Maraston & Stromback 2011, 
hereafter Mil) are also available over this wavelength range. 
BC03 has a spectral resolution of 75 km s _1 , while Mil 
spectral resolution is 65 km s^ 1 , similar to the instrumental 
resolution, ~ 75 km s _1 , of the DR7 and BOSS spectra. 



3.2 Model library 

We generate a library containing 25,000 realization^] of dif- 
ferent SFHs using SSP models from BC03. The model li- 
brary is parametrized as follows 

(i) SFHs. Each SFH consists of three parts: a) an underly- 
ing continuous model, b) a series of super-imposed stochastic 
bursts, c) a random probability for star formation to stop ex- 
ponentially (i.e. truncation). In the continuous model, stars 
are formed from the time tf OIln to the present according to 
the following functional form: SFR(i) oc exp[— j(t — iform)]- 
The formation time tf or m is uniformly distributed between 
13.5 and 1.5 Gyr and the star formation inverse time-scale 
7 is uniformly distributed over the range 0-1 Gyr" 1 . The 
main parameter that controls the bursts is the amplitude A, 
defined as the fraction of stellar mass produced during the 
burst relative to the total mass formed by the continuous 
model. A is logarithmically distributed between 0.03 and 4. 
During the burst, stars are formed at a constant rate that 
is independent of A for a time iburst, which is uniformly dis- 
tributed between 3 x 10 7 — 3 x 10 8 yr. Bursts occur at all 
times after iform with equal probability. The probability is 
set in such a way that 15% of the galaxies in the library 
experience a burst in the last 2 Gyr. 

The existence of a population of massive, compact "post- 
starburst" galaxies at hi gh redshifts with l i ttle o r no ongo- 
ing star formation (e.g., Kriek et al.ll2006l . 2009) has trig- 
gered us to add possible truncations to the SFHs. For 
30% galaxies in the library, we truncate the star forma- 
tion at a random time in the past. Following the truncation 
event, the star formation rate evolves as SFR(t > t cut ) ~ 
SFR(i cu t) exp[— (t — tout)/"?"], where t cut is the truncation 
time and logr is uniformly distributed in the range 7 to 9. 
We note that in iKauffmann et al.l |2003al ). the fraction of 
galaxies with bursts in the last 2 Gyr was set to be 50%. 
We have reduced this fraction to 10% (after truncations are 
included) because it provides a more uniform distribution in 
the light-weighted age of the models. The influence of the 
choice of the fraction of galaxies with recent bursts on our 
physical parameter estimation is discussed in §5.3. 

(ii) Metallicity. The BC03 model library includes six 
metallicities ranging from 0.005 to 2.5^0 ; we interpolated 
the BC03 model grids in metallicity in log-space. 95% of 
the model galaxies in our library are distributed uniformly 
in metallicity from 0.2 — 2.5Zq; 5% of the model galaxies 
are distributed uniformly between 0.02 and 0.2Zq. The rea- 
son for including thes e very low metallicity models is that 
iMaraston et al.l (|2009h have found that the colors predicted 
for simple single stellar populations do not provide a good 
match to the observed colors of the reddest galaxies in the 
SDSS. Adding a small fraction (typically 3% of the total 
stellar mass) of old metal-poor stars allows one to achieve a 
significantly better match to the observations. 

(iii) Dust extinction. Dust extinction is modelled us- 
ing the two-component model described in ICharlot fc Falll 



2 In order to access the effects of the "resolution" of our model 
parameters, i.e. how estimation of physical parameters depends on 
the number of models, we increase the model numbe r to 1 00,000, 
finding that 25,000 models are enough to converge. ISalim et al.l 
reached the same conclusion. 
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(2000). The V-band optical depth has a Gaussian distribu- 
tion over the range < tv < 6, with a peak at 1.2 and 
68% of the total probability distribution distributed over 
the range — 2. Our adopted prior distribution of ry values 
is motivated by the obs erved distribution of Ba lmer decre- 
ments in SDSS spectra l|Brinchmann et al] |2"004). The frac- 
tion of the optical depth that affects stellar populations older 
than 10 7 yr is parametrized as fi, which is again modeled as 
a Gaussian with a peak at /i = 0.3 and a 68 percentile range 
of 0.1 - 1. 

(iv) Velocity dispersion. Each of the model spectra is con- 
volved to a velocity that is uniformly distributed over the 
range of values from 50 to 400km s _1 . 

We adopt the un iversal initial mass function (IMF) 
given in iKroupal (|200ll ). For each model in the library, we 
store the following properties: 

(i) the spectrum over the wavelength range (91 — 
160.000A); 

(ii) the strengths of the D4000 and H<5a indices, measured 
in the same way as in the SDSS spectr 

(iii) the r-band luminosity-weighted age, which is defined 
as U = J*[dr SFR(t-r) f r (r) r}/ f*[dr SFR(i-r) / r (r)], 
where /, (r) is the total r-band flux produced by stars at 
age r; 

(iv) the mass- weighted age, which is calculated as t m = 
f*[dr SFR(i - r) r]/ f*[dr SFR(i - r)]j 

(v) the i-band and z-band stellar mass-to-light ratios, 
M*/Li and M*/L z , of the model at redshifts between and 
0.8 in steps of z = 0.05. Note that we account for the fraction 
of the initial stellar mass that is returned to the interstellar 
medium by evolved stars (e.g., we output the mass of living 
stars and remnants, not the mass formed); 

(vi) the star formation history, including fraction of stars 
formed in recent bursts, time of truncation etc.; 

(vii) the metallicity; 

(viii) the dust parameters tv and /i; 

(ix) the stellar velocity dispersion; 

The choice of priors is important in Bayesian analysis; 
we test the dependence of our stellar mass estimations on 
the input parameters of the model library in §5.2. 

3.3 Identifying the significant principal 
components of the spectral library 

Our method makes use of Principle Component Analysis 
(PCA) , a standard multivariate analysis technique (see Bu- 
davari et al 2009., for a recent discussion). A spectrum con- 
taining M wavelength points can be regarded as a single 
point in an M-dimensional space. A group of spectra form 
a cloud of points in this space. PCA searches for a vector 
(principal component) which has as high a variance as pos- 
sible in the cloud of points. Each succeeding component in 
turn has the highest variance possible under the constraint 
that it be orthogonal to (i.e. uncorrelated with) the preced- 
ing components. 

3 The values of D4000 and USa for DR7 galaxies can be found 
in the MPA/JHU catalog. The band passes over which these 
two indices are measured a re given in iBalogh et al ] GUI) and 
IWorthev fc Ottavianil Jl997l "l. 
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Figure 2. From top to bottom: the mean spectrum of the model 
library followed by the first to the seventh eigenspectra. 



Before we run the PCA code on the library of models, 
we mask the regions around nebular emission lines in the 
model spectra. Our models do not include emission lines, 
and it is important to treat the models and the real data 
in exactly the same manner. We mask 500 km s _1 around 
the [O n]3726.03, [O n]3728.82, H 8 3889.05, [Ne m]3869.06, 
H74101.73, H54340.46, ^4861.33, [O in]4959.91, and 
[O iii]5007.84A lines. Each spectrum in the masked library 
is normalized to its mean flux between 3700 - 5500A. 

Let us express the i-th normalized spectrum as S it k x , 
where i is an integer running over the 25,000 model galaxies 
and k\ is an integer running over each pixel in the spectrum. 
We calculate the mean spectrum of the masked library and 
subtract this from each of the model spectra. We then run 
our PCA code on the "residual" spectra. Figure [2] presents 
the mean spectrum and the first seven PCs for our input 
model library. 

As expected, the mean spectrum is typical of that of a 
galaxy with an intermediate age stellar population. The first 
PC is relatively featureless. As we will show, it provides a 
first-order measure of the age of the stellar population and 
it is strongly correlated with both 4000A break and Balmer 
absorption lines strengths. The second PC is quite noisy, 
but as we will show in §3.4.1, it encodes information about 
the stellar velocity dispersion of the galaxy. The third PC 
contains information about velocity dispersion and metal- 
licity. The fourth PC clearly recovers information contained 
in the Balmer absorption lines, even though the line centers 
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have been masked. Call (H+K) and Mgb absorption lines 
are clearly visible in the fifth PC; as we will show this com- 
ponent carries much information about galaxy metallicity. It 
is difficult to determine what information is encoded in the 
sixth and seventh PCs by simple visual inspection. In the 
next section, we present a more quantitative analysis that 
demonstrates that these two PCs encode information about 
velocity dispersion and metallicity, respectively. 



3.4 Decomposing each model and observed 
spectrum into its PC representation 

3.4-1 Projection of the model library 

The i-th model spectrum Si : k x is projected onto the eigen- 
spectra as follows: 

Si, k x = Mk k + y ' C™a Ea,k x + Ri,k x , (2) 
<x 

where Mk x is the mean spectrum of the model library. The 
integer k\ indexes the rest-frame wavelength bin of the spec- 
trum. C™ a is the amplitude of the a-th PC E a .k x (Note that 
a ranges from 1 to 7, and the superscript m refers to to the 
fact that the models are being projected onto PC space in 
this section). C™ a can be expressed as 

C%a = /,( s i,k x - M kx ) E a , kx - (3) 

h X 

Ri,k x is the residual of the i-th spectrum from its PC 
representation (it can be regarded as a measure of the theo- 
retical "noise" in the decomposition). The covariance matrix 
of the theoretical "noise" can be written as 

= (1/iVmod) ( R ^ R ^0 W 

i=l,iV nM>d 

where -/V mo d is the number of models in the library (note that 
this noise covariance is indexed by rest-frame wavelength). 

Although the PC representation of spectra is compact 
and mathematically convenient, it is not a priori clear what 
physical information is encoded in each of the components. 
In some cases (see Figure 2), one can simply eyeball the 
eigenspectrum and make an educated guess as to its "mean- 
ing" , but in other cases very little can be gleaned from simple 
visual inspection. 

In order to develop a better understanding of the infor- 
mation encoded in the PCs , we search for the best corre- 
lation between Z + ^ (X a x C™) and a variety of galaxy 
parameters (P) that we store for each model spectrum (ex- 
amples of P include spectral properties such as D4000, H<5a, 
and velocity dispersion (Vaisp), as well as model parameters 
such as stellar mass-to-light ratio, light-weighted age, dust 
extinction, metallicity, and fraction of stars formed in the 
last 1 Gyr (F,)). This can be thought of as finding the lin- 
ear combination of the PC amplitudes that best predicts the 
parameter P when averaged over all the model spectra in the 
library. The zero point Z and coefficients X a are calculated 
by minimizing 

iVmod 

a= J2 [J2 Xa xC ^ + z ~ p >] 2 - ( 5 ) 

i— 1 a 
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Z + L„ =| 7 (X„ X C™ ) 

Figure 3. The correlation between 8 different model galaxy spec- 
tral properties or parameters: (a) D4000; (b) H^; (c) velocity 
dispersion; (d) z-band stellar mass-to-light ratio; (e) luminosity- 
weighted age; (f) dust extinction; (g) metallicity; (h) the fraction 
of stars formed in the last Gyr, and the linear combination of 
C™ that minimizes the scatter in the correlation (see text). The 
values of the coefficient X a and the zero point Z for each case 
are listed in Table 1. 

When we perform this exercise, we increase the number of 
PCs used in the projection one at a time, each time checking 
the mean correlation between Z + ^2 (X a x C™) and P as 
well as its scatter . We find that our results converge when 
a =7; and therefore use the first seven PCs in our analysis. 

Figure [3] shows the correlations between Z + (X a x 
C™) and a variety of different galaxy parameters P. We have 
included D4000 and H8a in this set even though they are 
not physical parameters, because they were used extensively 
in our previous work and we would like to understand how 
they relate to our new system of PCs. In addition, we in- 
clude stellar velocity dispersion, i-band mass-to-light ratio, 
r-band light-weighted mean stellar age, metallicity, dust ex- 
tinction and the fraction of stars formed in the last Gyr. 
We see that we are able to recover very accurate estimates 
of D4000, H<5a, velocity dispersion and dust content from 
the principal components. This is not surprising, because 
the 4000 A break and the Balmer absorption lines are the 
strongest features present in the spectra for the wavelength 
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Table 1. Values of X a and Z for each spectral property or physical parameter P 



p 


Ci 


c 2 


c 3 


C*4 


c 5 


c 6 


C 7 


Z 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


D4000 


0.049 


-0.018 


0.016 


0.083 


-0.168 


0.039 


0.029 


1.619 


K6 A 


-0.517 


0.369 


-0.460 


-2.551 


0.905 


-0.454 


-0.497 


2.808 




0.267 


135.517 


-59.563 


50.294 


-42.942 


-176.941 


17.609 


226.919 


log M./L, 


-0.051 


-0.059 


-0.079 


0.103 


-0.084 


0.015 


-0.022 


-4.209 


log age(yr) 


0.071 


-0.061 


-0.105 


0.189 


0.111 


0.078 


0.340 


9.385 


A* x 


0.017 


0.295 


0.366 


-0.678 


0.445 


-0.176 


-0.265 


0.528 


log metallicity 


0.009 


-0.017 


0.070 


-0.041 


-0.710 


-0.088 


-0.821 


-1.692 


logF, 


-0.437 


0.333 


-0.245 


-2.032 


-0.248 


-0.259 


-1.855 


-3.631 



log (luminosity-weighted age) 
log metallicity 




HSA 

log M*/L 

|iXt v 
log F* 



C2 C3 C4 C5 



Figure 4. This figure shows the relative contribution of each PC 
to the same spectral properties/parameters displayed in Figure 3. 
They are color-coded as follows: Black — D4000; red — velocity 
dispersion; blue — luminosity-weighted age; green — metallicity; 
grey — H5^; orange — stellar mass-to-light ratio; cyan — dust 
extinction; magenta — fraction of stars formed in the last Gyr. 



F pc (a) 



y 7 t 1 



\x* x a 



(6) 



The results are shown in Figure [4] For each parameter P in 
Figure 3, we plot F pc as a function of a, where a is the in- 
dex of the PC component. The results are largely consistent 
with our previous discussion of Figure [2] Information about 
D4000, H<5a, stellar mass-to-light ratio, light-weighted age 
and the fraction of stars formed in the last Gyr is primarily 
contained in PCI, with lesser contributions from PC4 and 
PC5. As we have discussed, PCI provides a measure of the 
continuum shape, whereas PC4 is similar to the component 
defined in Wild et al (2007) that provides a measure of the 
"bursty" nature of the past star formation history. Informa- 
tion about velocity dispersion is encoded in PC2, PC3 and 
PC6. Information about metallicity is mainly contributed 
by PC5 and PC7. Interestingly, PC components 2 through 5 
contribute almost equally in the estimation of stellar mass- 
to-light ratio. 



range that we have chosen. Likewise, increasing extinction 
and velocity dispersion modify the shape of the spectrum 
and the width of the spectral features in a roughly linear 
way, so it is expected that the correlation with the appro- 
priate combination of PC components will be very tight. 

On the other hand, there are well known degeneracies 
between stel lar age and me tallicity that affect many stel- 
lar features l|Qconnelll 1 19861 ) . In past work, certain specific 
features have be en identified as being key to breaking this 
degeneracy (e.g., Wo rthcv 1994; Vazdckis fc Arimotdll999l ; 
iMaraston fc Thomas! |2000| ; IL"e~Borgne et ai1l2004h . so it is 
interesting to see whether our PCA technique is capable of 
doing the same. In Figure 3, Panels (d) and (e) indicate 
that one is able to recover reasonably accurate estimates of 
stellar mass-to-light ratio and mean stellar age. Panel (h) 
shows that our method is able to cleanly identify galaxies in 
which more than ~ 1% of the stellar population formed in 
the last Gyr. Panel (g) shows that metallicity can be recov- 
ered for values that are below solar (log metallicity < —1.7). 
We note, however, that we have not yet considered the ef- 
fect of varying element abundance ratios, which significantly 
complicates metallic ity estimation in real elliptical galaxies 
(|Thomas et al.ll2002h . 

In Table 1, we list the set of X a and Z values for each 
parameter P. In Figure 3] we attempt to illustrate the rela- 
tive "importance" of each of the PC components in estimat- 
ing different galaxy parameters. We define F pc (a) as 



3.4-2 Projection of the real data 

In this section, we describe how we apply the PCA method- 
ology to observed galaxy spectra. The steps are as follows: 

(i) The galaxy spectra are corrected for foreground Galac- 
tic extinction and the wavelength scale is shifted from vac- 
uum to air to match the models. 

(ii) The set of emission lines listed in §3.3 are masked. In 
addition, we found it necessary to mask 500km s _1 around 
H 13770.63, 3797.90, 3835.38, 3889.049, 3970.072, 4101.734, 
4340.464, He 14387.93, 5047.74, He H5015.68, [Ne in]3967.79, 
[O in]4364.2lA lines in the subset of strong emission line 
galaxies with EQW of H/3 < -5 (12% of DR7, 1% of 
CMASS). We have checked that our results are robust to 
the choice of mask size. 

(iii) The observed spectrum and its error array are nor- 
malized by dividing by the flux density averaged over the 
full observed wavelength range. We then apply an integer 
pixel offset to shift them from observed to rest-frame wave- 
lengt We denote the normalized flux density and its error 
arrays as Ou x and Eps fcx , respectively. For "bad" pixels and 
night sky lines identified in the SDSS mask array, we set the 



4 This is possible because the wavelength interval of SDSS spec- 
tra is a constant in log-space with Alog A = 10 -4 . We rebin the 
model spectra to the same wavelength interval. 
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Figure 5. The rescaled mean observational error covariance matrix Cov, ,/ in the wavelength interval of 4000 — 9000A derived from 

A ' A 

DR7 repeat spectra (left) and BOSS repeat spectra (right). 
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Figure 6. This figure shows Ci vs. C2, C3 vs. C4 and C5 vs. C'a for DR7 galaxies (top) and for BOSS galaxies (bottom). Models are 
shown in red contours, while data is shown in grey scales. We have convolved the model spectra with the errors appropriate for DR7 and 
CMASS when generating the PCA components for this plot. The outermost contour encloses 95% of the models. 



pixel values in the observed normalized spectrum, Ou x , pro- 
portional to the value in the mean spectrum of the model 
library, a x Mk x , where a is the mean flux of Ok x between 
rest frame 3700 — 5500A, which takes the different normal- 
izations between models and data into account. (We choose 
this normalization of the data for the convenience of esti- 
mating the mean observational error covariance matrices, 



see Eq. 9). The corresponding pixel values in the error array 
are set to 10 times the mean error of all the good pixels. 

(iv) The coefficients of the PC components of the ob- 
served spectrum are 



(7) 
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Note that the index k\ ranges over the pixels between 
3700-5500A in the rest frame. 

3.4-3 Error estimation for the PC coefficients 

This section describes how we estimate the errors on C a as 
calculated in equation (7). Let us write the observed spec- 
trum over the restricted wavelength range we are modeling 

as 

O kx =ax M kx + Ci rue E a , kx +Nl h x xa + Nt x (8) 

a 

where the first two terms in equation (8) represent the "true" 
PC representation of the galaxy in the absence of errors of 
any sort. N]j? and N^ s are independent Gaussian noise vec- 
tors representing the "theoretical" and the "observational" 
errors. These two noise vectors have covariance matrices 
C ov l h x ,k> and ^ ov k b x S .k' > respectively. 

Covf k , is the same for each observed galaxy and is 

given by equation (4). The covariance matrix of the obser- 
vational errors differs from one galaxy to the next. The di- 
agonal terms are given by Epsj^ , the square of the normal- 
ized error array of the particular galaxy in question. The 
off-diagonal terms are somewhat tricky to evaluate, because 
they depend on details of the instrumental response, flux 
calibration errors, etc. 

Both the DR7 main galaxy sample and the BOSS 
CMASS sample include a large number of galaxies that were 
observed multiple times, due to overlaps between different 
spectroscopic fiber plug plates. These "repeat spectra" are 
extremely useful for assessing uncertainties in a variety of 
estimated quantities. We identify a sample of 48,000 "re- 
peat galaxy spectra" for DR7 (19,000 for CMASS); these 
two samples are used to construct the mean observational 
error covariance matrices for DR7 and CMASS galaxies as 

GaVl ^ = W~ ^ (Oi, 1 ,ix-Oj,a,iJ.(O i , 1 , !i -0 Aa ,i 1 )(9) 

palr j = l,N pail 

where Oj >a ,i x is the normalized flux of the a-th observation 
(a = 1 or 2) of the j-th pair. Bad pixels are replaced by 
the median of 200 neighbouring pixels. l\ ranges over all the 
pixels in the spectrum. Note that in contrast to the index 
k\ used above, l\ indexes wavelength in the observed frame, 
and Oj, a ,i x refers to the observed spectrum. 

In Figure we show a diagrammatic representation 
of the observational covariance matrices from the DR7 
and BOSS repeat observations over the wavelength inter- 
val of 4000 — 9000A. We have stretched the scale in order 
to distinguish positive from negative values and to have 
a near-logarithmic scale near zero (in practice, we plot 
sign(Cou; A ;/ )\Covi x i* | 0,3 ). The absolute values of the el- 
ements on the BOSS mean observational error covariance 
matrix are much larger than that for DR7, reflecting the 
fact that the BOSS data are at lower S/N. The strong shift 
in sign at 6000 A reflects the change between the blue and 
the red channels of the SDSS spectrograph. When using the 
matrices below (see equation 11), the off-diagonal terms of 
Cov? bs ., are set to be the values of Cov, x p at the corre- 

K \' K X *' A 

sponding observer-frame wavelength. 

The difference between "true" and estimated PC coef- 
ficients is 



dc a = &r e -c d a =Y, E a ,k X (Nt h x >< « + <n ( io ) 

The covariance matrix of this difference (i.e the covariance 
in the errors on the different PC coefficients) is then given 
by 

Cov l\> =< dC <* AC *' >= 



3.5 Comparison of the PC components derived 
for the models and for the data 

If parameter estimation using the model library is to be 
robust, we must make sure that the models cover the same 
range of PC space as the real data. Figure[5]shows G\ vs. Ci, 
Cz vs. d and C5 vs. C& for DR7 galaxies (top) and for BOSS 
galaxies (bottom). Models are shown in red contours, while 
data are shown in gray scales. We have convolved the model 
spectra with the errors appropriate for DR7 and CMASS 
when generating the PC A components for this plot, so that 
the comparison between models and observations is realis- 
tic. As can be seen, data and models cover roughly the same 
regions of PC parameter space for both the DR7 and the 
CMASS samples. We note that the worst discrepancies be- 
tween models and data are for the C4 index, which encodes 
information about Bal mer absorption lin es. This problem 
was previously noted by I Wild et al.l l|2007h in their analysis 
of post-starburs t galaxies using data from the SDSS Data 
Release 4 (DR4; lAbazaiian et al.|[2005l ). Interestingly, agree- 
ment between models and data in the C4 versus C3 place is 
significantly better for the BOSS galaxies, which are signif- 
icantly more massive and have metallicities close to solar, 
where the coverage by the stellar libraries is more complete. 



3.6 Estimation of physical parameters and their 
uncertainties 

For an observed galaxy at redshift z, we select only models 
that have an age smaller than the age of the universe at 
that redshift. We step through the models one at a time, 
calculating the \ 2 as follows: 

Xi = ^ — C a )P a , a i(C i a i — C a i) (12) 

a, a' 

where P„ „/ is the inverse of C'ov pc , . 

We define a weight u?j = exp(— x?/2) to describe the 
similarity between the given galaxy and model i. We then 
build a probability distribution function (PDF) for each pa- 
rameter P, by looping over all the model galaxies in the 
library and by summing the weights Wi at the value of P 
for each model. We normalize the final PDF and note the 
parameter values at the 2.5, 16, 50 (median), 84 and 97.5 
percentiles of the cumulative PDF. We adopt the median 
as our nominal estimation of P and the 16 — 84 percentile 
range of the PDF as its ±1<t confidence interval. 
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Figure 8. The linear combinations of the PC coefficients pre- 
dicted to provide the best representation of D4000, Hi^ and stel- 
lar velocity dispersion (see Section 3 for details) are plotted as 
a function of direct measurements of these quantities from the 
SDSS pipeline. 5 PCs are used for the bottom-right panel, while 
7 PCs are used for the other three panels. The sample has been 
restricted to DR7 galaxies with spectral median S/N per pixel 
greater than 10. 



4 ADVANTAGES OF THE PRINCIPAL 
COMPONENT METHOD 

In the previous section, we described our methodology for 
deriving a set of principal components from a library of 
model spectra, for decomposing real galaxy spectra into lin- 
ear combinations of these components, and for estimating 
errors on the derived PC amplitudes. We also outlined a 
Bayesian technique for parameter estimation using the input 
model library. In this section, we will attempt to illustrate 



the power of our methodology by means of some scientific 
applications. 

4.1 A robust method for stellar continuum fitting 

To decompose the galaxy spectrum into the emission pro- 
duced by ionized gas and that produced by stars, one usually 
attempts to fit a model that describes the stellar component 
of the galaxy and then one "subtracts" this model from the 
observed spectrum. Certain emission lines, for example H/3, 
frequently occur in deep absorption troughs, particularly in 
early-type galaxies, so it is important that the model for the 

stellar continuum be as accurat e as possible. 

In previ o us work (|Brinchmann et al] I2004J : 
iTremonti et al.l 120041 1. we employed a template-fitting 
procedure to model the stellar continuum. As described in 
section 3, instead of a set of templates, our new method 
finds the linear combination of PC eigenspectra that 
best fits the stellar continuum. In Figure [7] we show two 
examples of PCA fits to DR7 (left) and BOSS (right) 
galaxy spectra over the wavelength interval from 3700 to 
5500A. The upper panel shows the spectrum of a star 
forming galaxy with strong Balmer emission and small 4000 
A break, while the lower panel shows a typical early-type 
galaxy with strong stellar absorption features. The black 
lines are the normalized observations, the red lines show 
the best PCA fits, which are clearly very good. 

4.2 Improvement in S/N over Lick Indices when 
using PC-based techniques 

It is traditional to focus on specific stellar absorption fea- 
tures over a narrow wavelength interval when analyzing the 
ages and metallicities of the stellar populations of galax- 
ies from their spectra. For galaxies with old stellar popu- 
lations, the Li ck/IDS system of ~25 narrow-band indices 
is often used (IWorthe vl 1 19941 ; IWorthev fc Ottavianil Il997l ; 
iGorgas et al.|[l999l). For actively s tar-forming galaxies, the 
4000 A break |Balogh et afl ^999) and Balmer absorption 
line features, such as the H<5a index, provide important in- 
formation about stellar age and recent star formation history 
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Figure 9. A sample of DR7 galaxies with two observations has been extracted from the SDSS database and measurements of D4000 
(top-left) , IWyt (top-middle) and Vdisp (top-right) are plotted against each other in the top panel. The corresponding PC representations 
of these quantities are plotted against each other in the bottom panel. 
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Figure 10. This figure is similar to Figure 9, except that results are shown for multiply-observed CMASS galaxies. 



l|Kauffmann et al. 2003a). Finally, the velocity dispersion of 
the stars in the galaxy is traditionally estimated by com- 
paring the width of stellar absorption features with those in 
unbroadened stellar templates (see Appendix B of Bernardi 
et al. 2003 for a review). 

In Figure 3 of Section 3, we used model galaxy spectra to 
illustrate how narrow-band spectral indices such as D4000 
and HWa could be "recovered" from appropriate linear com- 
binations of the PCs. We also showed that the PCs could in 
principle be used to recover an estimate of stellar velocity 
dispersion. Figure [S] illustrates how well this works for real 
galaxy spectra. We compare Z + ^2 (X a x C a ) (using the 
X a and Z values in Table 1) with actual measurements of 



D4000, YLSa and stellar velocity dispersion for a subset of 
galaxies drawn from the DR7 sample (note that these mea- 
surements are drawn from the MPA/JHU database). Since 
the purpose of this figure is to illustrate how well the recov- 
ery of traditional Lick indices and stellar velocity dispersion 
estimates is able to work in principle, we only plot galaxies 
with spectra where the median S/N per pixel is greater than 
10. 

We obtain very tight correlations between the appropri- 
ate linear combinations Z + ^2 (X a x C a ) and both D4000 
and H<5a- We were not able to recover a good correlation 
with the velocity dispersion from the SDSS pipeline unless 
we reduced the number of PC components from 7 to 5. As 
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illustrated in Figure [4j PC6 should in principle contribute 
significantly to our estimate of Vd.sp- The increase in scat- 
ter when using seven PCs instead of five is attributable to 
the fact that there are large uncertainties in measuring PC6 
even for spectra with S/N per pixel greater than 10. Be- 
low, we will always use just 5 PCs when estimating velocity 
dispersions, but 7 PCs when estimating all other quantities. 

We now demonstrate that Z + (X a x C a ) has higher 
signal-to-noise ratio than the direct measurements of D4000, 
H<5a and velocity dispersion for typical galaxy spectra drawn 
from the DR7 and CMASS samples. As we will show, the im- 
provement is most striking for the low S /N CMASS spectra. 
The upper panel of Figure [9] illustrates the scatter obtained 
between the directly measured values of D4000, H<5a, and ve- 
locity dispersion for two different observations of the same 
galaxy in DR7. The bottom panel shows the same format 
for the corresponding Z + ^ (X a x C a ) representation of 
the same quantities. Figure [10] displays the same compari- 
son based on repeat observations of CMASS galaxies; note 
that for this sample, > 85% galaxies have a median S/N per 
pixel below 4. The improvement in S/N exhibited by the 
PC representations of D4000 and H5a are highly significant 
both for the DR7 galaxies and the CMASS galaxies. The 
improvement in the PC-based velocity dispersion estimate 
is only obvious for the low S/N CMASS sample. 



5 APPLICATION OF THE PRINCIPAL 
COMPONENT METHOD TO STELLAR 
MASS ESTIMATION 

The application we wish to highlight in this paper (more 
applications will follow in subsequent work) is the deriva- 
tion of robust PCA-based stellar masses for galaxies from 
the DR7 and CMASS samples. For each sample, two sets 
of stellar masses are estimated: the stellar mass measured 
within the fiber and the total mass. In the case of DR7, 
the masses are calculated by multiplying M,/L z by the z- 
band luminosity measured within the 3" SDSS-I spectro- 
graph fiber aperture, or by the luminosity L z derived from 
the SDSS 2-band "model" magnitude. For the CMASS sam- 
ple, Mi, /Li is multiplied by the i-band luminosity measured 
within a 2" diameter aperture matched to the smaller BOSS 
spectrograph fibers, or by the i-band luminosity Li derived 
from the SDSS i-band "cmodel" magnitudes. We note that 
"model" and "cmodel" magnitudes are close to equivalent 
for DR7 galaxies, but diverge at the fainter magnitudes of 
the CMASS sample. In general "model" magnitudes are rec- 
ommended for characterizing the colors of extended objects, 
since the light is measured consistently through the same 
aperture in all bands, while the "cmodel' magnitudes pro- 
vide a more reliable estimate of the total flux from the galaxy 
that accounts for the effects of local seeing. 

Before we present science results using PCA-based 
masses, we compare them with the photometrically- 
derived ones. For the DR7 sample, photometric masses 
are given in the MPA/JHU catalog and derived from the 
u,g,r,i,z broad-band photometry as described in §2.1. 
For the BOSS sample, we do not have an independent 
set of photometrically-derived stellar masses for compari- 
son. However, we estimate the stellar mass-to-light ratio 
M*(g,r,i, z)/Li by fitting the observed g, r, i, z-band fiber 
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Figure 11. The ±lcr uncertainty interval in the logarithm of the 
stellar mass for DR7 (left) and CMASS (right) galaxies. Results 
are plotted as a function of the median S/N per pixel in the 
spectrum. In both panels, red lines show the median value of P84 
- P16 as a function of S/N; P84 and P16 are the 84th and 16th 
percentiles of the cumulative PDF of the PCA-based stellar mass 
estimate logM, (PCA) for each galaxy. Blue lines show the same 
quantity for the photometrically-derived stellar masses. The black 
lines are the median ±lcr statistical errors derived from repeat 
observations. 



fluxes using the same model library. We have not included 
the -u-band in our fits, because the CMASS galaxies are faint 
in the u-band and the errors are large. In order to avoid any 
uncertainties due to aperture effects, we use fiber masses in 
this whole section. 

Figure [TT1 shows the ±lcr uncertainty interval in log AL 
for DR7 (left) and CMASS (right) galaxies. Results are plot- 
ted as a function of the median S/N per pixel in the spec- 
trum. In both panels, the red lines show the median values 
of P84 - P16 in each S/N bin; P84 and P16 are the 84th and 
16th percentiles of the cumulative PDF of our PCA-based 
stellar mass estimate logM*(PCA). The blue lines track the 
same quantity for the photometrically-derived stellar mass 
estimates. The black lines represent the Ha scatter in the 
log AL estimates derived from repeat observations. For DR7 
galaxies, the errors on the stellar mass are virtually identical 
when using PCA to fit the spectra, or when fitting to the 
photometry. For BOSS, the PCA method yields significantly 
smaller errors, particularly at low S/N. This behaviour is ex- 
pected, because the BOSS galaxies with low S/N spectra are 
faint galaxies where photometric errors tend to be large. 

We also note that P84 - P16 of logM*(PCA) is in gen- 
eral larger than the median ±1<j statistical scatter in log M* 
derived from repeat observations. This feature arises because 
the latter is only a measure of the error on logM» due to 
noise in the spectra; the former accounts for both noise and 
the fact that different model galaxies with different M*/L 
values occupy the same region of PC-space. 



5.1 Comparison with photometrically-derived 
stellar masses 

In Figure 1121 we compare the PCA-based stellar masses 
with the photometric masses. The top-left panel of Fig- 
ure [T2] shows the difference between logA/*(PCA) and 
logM, (MPA/JHU) as a function of S/N for DR7 galaxies. 
The red line is the median, the two green dashed lines show 
the 68 percentile spread. There is a systematic ~0.05 dex 
offset between the two mass estimates, but the scatter is 
quite small (< ±0.1 dex at S/N > 10). The offset of our 
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Figure 12. The difference between our PCA-based stellar masses 
and those derived from broad-band photometry is plotted as a 
function of median S/N per pixel for DR7 galaxies (top- left) 
and for the CM ASS sample (top-right). In both panels, the over- 
plotted red line is the median, the two green dashed lines show the 
68% spread. Bottom-left: the median discrepancy between PCA- 
based stellar masses and those derived from g, r, i, z-band pho- 
tometry as a function of spectral median S/N per pixel for the 
combined DR7 and CMASS samples. Bottom-right: histograms 
of the distribution of (P86 - P14), where P84 and P16 arc the 
84th and 16th percentile points of the cumulative PDF of the stel- 
lar mass estimated (derived using PCs) for DR7 galaxies (black) 
and for CMASS galaxies (blue). Red and green dashed lines show 
histograms of the same quantity for stellar masses derived from 
photometry for DR7 and BOSS, respectively. 

PC-based stellar mass estimates to slightly higher values is 
expected given that our model library includes truncated 
SFHs and a smaller fraction of galaxies with recent bursts 
(10% instead of 50%). In the next section, we will discuss 
how different assumptions about the mixture of SFHs in the 
model library influence our stellar mass estimates. 

The top-right panel of Figure [T2l shows the difference be- 
tween logM, (PCA) and logM,(g,r, i, z) for CMASS galax- 
ies. Although these two sets of stellar masses are derived 
using exactly the same model library, the PCA-based stellar 
masses are typically 0.08 dex higher than the ones estimated 
using the g, r, i, z-band photometry. In the bottom-left panel 
of this figure, we plot the median discrepancy as a function 
of spectral median S/N per pixel for the combined DR7 and 
CMASS samples, finding that the offset disappears when 
S/N > 8. We conclude that in the limit of high S/N, the 
two methods of estimating stellar mass yield exactly the 
same results. 

Finally, the bottom-right panel of Figure [12] shows the 
distribution of the ±lcr errors (i.e. P84 — P16) for our differ- 
ent sets of stellar mass estimates. Black and blue histograms 
are for the PC-based stellar masses for DR7 and CMASS. 
The dashed red and green histograms show error distribu- 
tions for the stellar masses derived from the photometry. For 
DR7 galaxies, the ±1<t uncertainties on the PCA-based and 
photometry-based masses peak at nearly the same value, 
with the DR7 photometric measurements having slightly 



less dispersion. However, for the CMASS sample, the errors 
on the photometry-based masses are on average ~0.05 dex 
larger than those derived using Principal Components, again 
reflecting the fact that the photometric errors are larger for 
this sample. 

5.2 Dependence of our stellar mass estimates on 
the input parameters of the model library 

In this section, we study the sensitivity of our stellar mass 
estimates to the input parameters of the model library. We 
change the input SFHs, dust extinction values and metal- 
licity distributions in the model library one at a time, and 
quantify the effect on the stellar masses. 

5.2.1 Star formation histories 

In the previous section, we explained the 0.05 dex system- 
atic difference between M,(PCA) and M,(MPA/JHU) as a 
consequence of the different SFHs used in generating the li- 
braries. To confirm this idea, we have generated a new set 
of stellar mass estimates using a library in which the burst 
fraction is increased from 10% to 50%. The top panel of 
Figure [TH] shows the difference in the stellar mass estimates 
AlogM, = logM, (PCA) -logM, (PCA, 50%burst) as a func- 
tion of D4000, where M, (PCA, 50%burst) is the library with 
50% burst fraction. Once again the red line in the top panel 
of Figure [13] shows the median value of AlogM, , while the 
two green dashed lines show the 68% spread. In the range 
of 1.4 < D4000 < 2.2, AlogM, ~ 0.05, consistent with the 
systematic offset found between our PC-based mass esti- 
mates and the photometrically derived ones for DR7 sample 
in §5.1. At lower values of D4000, galaxies are constrained to 
have formed a significant fraction of their stars recently, so 
the difference in the fraction of bursty galaxies in the library 
makes a much smaller difference to the results. 

5.2.2 Dust extinction 

To check how assumptions about dust extinction influence 
our stellar mass estimates, we generate a library without 
dust extinction. The middle panel of Figure[l3]shows the dif- 
ference AlogM, = logM, (PCA) - logM, (PCA, nodust) as a 
function of D4000. AlogM, increases as a function of D4000 
up to value of ~ 0.08 at D4000 ~ 1.5, and then remains 
approximately constant. The systematically smaller stellar 
mass-to-light ratios derived using the library with no dust 
extinction can be understood, because a smaller fraction of 
the optical light from the galaxy is assumed to be absorbed. 
If dust is not included in the models and D4000 is large, the 
fit is able to re-adjust to match with an older stellar popu- 
lation, which has higher mass-to-light ratio. At low D4000 
values, the stellar population is more tightly constrained to 
be young, so the degeneracy is again less important. 

5.2.3 Metallicity 

To check how assumptions about metallicity influence our 
stellar mass estimates, we generate a library with only solar 
metallicity models. The bottom panel of Figure [13] show the 
difference AlogM, = logM, (PCA) - logM, (PCA, Zsolar) as 
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Figure 13. The dependence of stellar masses on the input pa- 
rameters of the model library. Top panel: the difference between 
stellar masses based on the standard library and a library with a 
50% burst fraction; middle panel: the difference between stellar 
masses based on the standard library and a library with no dust 
extinction; bottom panel: the difference between stellar masses 
based on the standard library and a solar metallicity library. In 
all these plots, the red lines denote the median of AlogM*, and 
the two green dashed lines show the 68% spread. 



a function of D4000. As can be seen, for the young popula- 
tions (D4000 <1.5) the systematic effects induced by adopt- 
ing incorrect metallicity assumptions are very small. For 
the older populations, the difference increases with D4000, 
which is indicative of the well-known age-metallicity degen- 
eracy. 

In summary, stellar mass estimates are most strongly 
affected by assumptions about dust extinction, but also by 
the SFHs and metallicity of the model library galaxies. In 
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Figure 14. The difference in stellar masses derived from solar 
metallicity libraries based on BC03 and Mil models. 



all three cases, systematic offsets are of order 0.05—0.1 dex 
in log M*. Note we have not co nsidered changes to the IMF. 
To co nve rt our [ Kroupal (|200lh stellar masses to a ISalpeterl 
|l955h or lChabried l|200a ) IMF, one should add 0.18 dex to 
or subtract 0.05 dex from the logarith m of stellar mass es, re- 
spectively. We note, however, that the lSalpete 3 l|l955h IMF 
is disfav oured by dynamical M ,/L estimates of elliptical 
galaxies jCappellari et al.ll2006h . 



5.3 The dependence of stellar masses on the 

assumed stellar population synthesis model 

iKannappan fc Gawiseij (|2007f ) compared photomet r ically - 
derived stellar masses based on BC03 and iMarastonl (|2005h 
population synthesis models. These models are quite differ- 
ent in the optical due to the use of different stellar libraries. 
They found that BC03 models yield stellar mass estimates 
that are ~1.3 times larger, even when no near-IR photome- 
try is used in the fits. 

In this section, we investigate systematic uncertainties 
that may arise in our PCA-based stellar mass estimates as 
a result of our choice of stellar population synthesis model. 
Maraston & Stromback (2011) present high spectral resolu- 
tion st ellar population models using the MILES stellar li- 
brary (ISanchez-Blazquez et alj 120061) as input. The models 
have been extended with SSPs based on theoretical stellar li- 
braries, which extend the wavelength coverage of the models 
into the ultra-violet. 

We compare results at solar metallicity, where the stel- 
lar age coverage of the current version of the Mil model 
we are using is most reliable. Figure Q3] shows the differ- 
ence in the stellar masses estimated using these two li- 
braries as a function of D4000. The strongest systematic 
discrepancy appears at young ages (low values of D4000), 
and decreases for older stellar pop ulations. The offset of 
~ 0.12 , consistent with the result of IKannappan fc Gawiserl 
(|2007t), arises because the Mil models use Geneva tracks 
i|Schaller et all Il992l ; iMevnet et all 1 19941 ) to model stellar 
evoluti on, while the BC03 models make use of Padova 
tracks 1 Alongi ct al. 1993; Brc ssan et al . 1993; Fagotto " et al.l 
Il994al lbt Girardi et al.| [l996). In these tracks different as- 
sumptions are made regarding convective overshooting and 
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Figure 15. The distribution of our PCA-based stellar masses for 
BOSS CMASS sample. 



the temperature and energetics of the Red Supergiant Phase, 
leading to a significantly redder supergiant phase in the Mil 
models. 

We note that the prescription for stellar mass loss is 
somewhat different betwimeen BC03-models and Maraston- 
models, so we have used the BC03 mass loss prescriptions 
when performing the comparison shown in Figure 14. 



6 EVOLUTION OF MASSIVE GALAXIES TO 
REDSHIFT 0.6 

Figure [15] shows the distribution of total stellar masses 
based on the PCA method for the BOSS CMASS sam- 
pl43- As can be seen, the range of stellar masses spanned 
by this sample is rather narrow: 11 logM* ^ 12. In 
the local Universe, galaxies in this stellar mass range are 
mainly "red and dead" systems with no ongoing star for- 
mation. Many reside in groups and clusters and have radio 
jets, which are believed to heat the surr ounding gas, pre- 
venting it from cooling and forming stars llBoehrhiger et al 



19931; 



McNamara ct al. 2000; Fabian et al. 2003; Best ct al 



2005allbl 120061 ; iBower et aimood : ICroton et al l l200d) . Ac 



cording to the currently popular "down-sizing" scenario for 
galaxy evolution, galaxies of this mass should have com- 
ple ted their star form a tion at very early epochs (z > 
2; iHeavens et al ] 12004 iThomas et all I2005T I. They may 
subsequently grow through merging, but these merger 
event s are believed to be largely dissipationless (i.e. "gas- 
free" : iNaab et al ll2006l : iBell et al.l[200r3 : IScarlata et al.ll2007l ; 
iKang et al.ll2007M . If this picture is correct, we would expect 
the stellar populations of very massive galaxies to evolve 
only "passively" over the redshift interval from 0.6 to 0. In 
this section, we check whether these expectations are cor- 
rect. 



5 In this section, we use total stellar masses for both DR7 and 
BOSS samples. When we use the PCA method to estimate the 
total stellar mass, the underlying assumption is that the stellar 
mass-to-light ratio within the fiber aperture is the same as that 
for the entire galaxy. 



6.1 Fraction of massive galaxies with young stars 

Our PC-decomposition technique can isolate galaxies which 
have had more than a few percent of their stellar masses 
formed in the last Gyr (see panel (h) of Figure|3} . We explore 
the robustness of the SFHs derived by the PCA method in 
the Appendix. In this section, we define parameters F(> 
5%), F(> 10%), and F(> 15%), which represent the fraction 
of galaxies in which more than 5, 10 and 15% of the stellar 
mass was formed in the last Gyr, respectively. Using DR7 
and BOSS data, we study how these fractions have evolved 
since z ~ 0.6 for galaxies more massive than ~2x 10 1 Mq. 

Starting from the DR7 Main galaxy sample, we con- 
struct a magnitude-limited sample of galaxies with 14.5 < 
r < 17.6 and redshifts in the range 0.055 < z < 0.3. We 
also limit the sample to galaxies with ZWARNING = and 
SPECPRIMARY =1 to eliminate repeat observations and 
potential redshift errors. These restrictions result in a final 
DR7 sample of ~430,000 galaxies. For each observed galaxy 
i, we define the quantity z m i n ,i and z ma x,i to be the min- 
imum and maximum redshift at which the galaxy would 
satisfy the apparent r-band magnitude limit. Evol utionary 
and K-corrections are i ncluded in this calculation (|Li et al.l 
l2007l ; lLi fc Whitell2009r i. This allows us to define U max ,; for 
the galaxy as the total comoving volume of the survey be- 
tween zl and z2, where zl is the maximum of z m in,i and 
0.055, and z2 is the minimum of z maXj i and 0.3. F» > X% 
can then be estimated as 



F(> X%) = 



J2i = l,N act (Vmax.i) 



(13) 



where the sum on the numerator extends over iVact, the num- 
ber of galaxies in a given stellar mass bin that have formed 
more than X% of their stars in the last Gyr, while the sum 
on the denominator extends over 2V a ii, the total number of 
galaxies in the same mass bin. 

The red, black and blue lines in the left panel of Fig- 
ure [16] show logF(> 5%), logF(> 10%) and logF(> 15%) 
as a function of stellar mass for the DR7 sample (errors are 
derived from boot-strapping) . As can be seen, all three frac- 
tions decrease strongly and monotonically with increasing 
stellar mass. At all stellar masses, there are 10 times more 
galaxies that have formed more than 5% of their stars over 
the last Gyr, compared to the number that have formed 
more than 15% of their stars in the last Gyr. 

Before comparing these results with corresponding val- 
ues of F for CMASS galaxies at z ~ 0.6, we note that 
CMASS sample is not a simple magnitude-limited sample. 
There is a d± > 0.55 color cut, which means that blue galax- 
ies will be lost from the survey, particularly at the lower 
redshift end. This means that the fraction of actively star- 
forming galaxies that we compute for the CMASS galaxies 
represents a lower limit to the true value. 

Because we do not know the underlying relation be- 
tween color and stellar mass for galaxies with M* > 2 x 
1O 11 M0 at z ~ 0.55 (existing surveys do not extend over 
wide enough areas to sample large numbers of very mas- 
sive galaxies), it is difficult to correct for any missing blue 
galaxies. In order to provide a more quantitative idea of the 
degree to which the d± > 0.55 cut might affect our esti- 
mates of the fraction of gal axies with recent star for mation, 
we use the K-correct code l|Blanton fc Roweisll2007l ) to pre- 
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Figure 16. This figure shows the fraction of galaxies which have > X% stars formed in the last Gyr as a function of stellar mass. Left 
panel: The red, black and blue lines show logF(> 5%), log_F(> 10%) and log_F(> 15%) for the DR7 sample. Middle panel: The purple, 
red, blue, green lines (they have been slightly displaced in the x-axis by minus 0.1, 0.05, 0.0, — 0.05dex, respectively) are results for the 
redshifted DR7 samples at z = 0.45, 0.5, 0.55 and 0.6, after application of the CMASS color cuts, compared to the results obtained for 
the unredshifted DR7 sample (in black). Right panel: The dashed lines show logF(> 5%), logF(> 10%) and logF(> 15%) for CMASS 
sample with z > 0.54 and log Mr > 11.4; this is a lower limit since the d± > 0.55 constraint deletes some blue galaxies from the sample. 
Solid lines show the DR7 results for reference. 



diet the colors of galaxies in the DR7 sample at redshifts 
2 = 0.45, 0.5, 0.55 and 0.6. At each redshift, we select ob- 
jects that pass the CMASS target selection criteria. In addi- 
tion, we define a redshift-dependent lower stellar mass limit 
logM" m /M Q = 2.0 x z + 10.35, so that a passively-evolving 
galaxy at redshift z with stellar mass M* > Mi im would pass 
all the target selection criteria in equation (1) (with the ex- 
ception of ifibcr2 < 21.5, which is more difficult to estimate 
unless one has a model for the structure of the galaxy). 

The colored lines in the middle panel of Figure [Hal show 
logF(> 10%) = A'act/A'aii for the four redshifted DR7 sam- 
ples (the purple, red, blue, green lines have been slightly 
displaced in the x-axis by minus 0.1, 0.05, 0.0, — 0.05dex, re- 
spectively), compared to the results obtained for the unred- 
shifted DR7 sample (in black) . As can be seen, the frac- 
tion of massive galaxies with recent star formation could 
be under-estimated by more than a order of magnitude at 
z < 0.55. At higher redshifts, the fraction of actively star- 
forming galaxies that is missed is closer to a factor ~ 2. We 
therefore select a sub-sample of the CMASS galaxies with 
0.54 < z < 0.7 (0.54 is the median redshift of the CMASS 
sample) and logM*/M0 > 11.4 as the main high-redshift 
comparison sample for the DR7 massive galaxies. 

In the right panel of Figure 1161 the dashed lines show 
logF(> 5%), logF(> 10%) and logF(> 15%) as a func- 
tion of stellar mass for this sample. Solid lines show the 
DR7 results for reference. z m i n ,i and z max ,i values have been 
calculated for this sample by evaluating the minimum and 
maximum redshifts at which the galaxy would satisfy all the 
criteria in equation (1) except iflb cr 2 < 21.5. Evolutionary 
and K-corrections are included in this calculation. Vmax.i is 
calculated as the comoving volume of the survey between zl 
and z2, where zl is the maximum of z m in,i and 0.54, and z2 
is the minimum of z maXj i and 0.7. 

One conclusion from Figure[TS]is that the fraction of ac- 
tively star- forming galaxies with logM* > 11.4 has evolved 
strongly since a redshift of ~ 0.6. This result is not sur- 
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Figure 17. The black line reproduces the black dashed line in 
the right panel of Figure fTBl The three low mass data points on 
this black line are fitted with a linear relation logF t (> 10%) = 
-3xlogM*/M Q +33. We randomly generate a sample of 1,000,000 
galaxies with logM*/M0 has a Gaussian distribution over the 
range of [11.1, 12.1] with a peak at ~11.55 and 68% of them 
distributed over the range 11.37—11.73. We assign a value of F* 
to each galaxy so that the linear relation is reproduced on average 
(red line on the plot). We then add an error to each stellar mass 
using the error distribution of CMASS galaxies as a function of 
Mt . This mimic sample has a similar distribution in logAf* /Mq 
as our CMASS galaxies within 0.54 < z < 0.7. We then recompute 
the relation between logF*(> 10%) and stellar mass, the result is 
shown as the blue line. 
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prising. The evolution of the dependence of star formation 
on galaxy stellar mass has been studied using data from 
other deep surveys (e.g JZheng et al.ll2007l ; IChen et al.ll2009l ; 
iKarim et alj|20ilh and, in general, the claim was been that 
the rate of decline in cosmic SFR with redshift is the same 
at all stellar masses. None of these previous surveys, how- 
ever, have extended to stellar masses as high as 10 12 Mq. In 
the BOSS data, we find the striking result that the fraction 
of actively star formation galaxies flattens above a stellar 
mass of W 11s 'Mq at z ~ 0.6. At the largest stellar masses, 
therefore, the evolution in the fraction of star- forming galax- 
ies from z ~ 0.6 to the present-day is even more dramatic, 
reaching a factor of ~ 10 at log M» ~ 12. We emphasize once 
again that these numbers represent lower limits on the evo- 
lution, because of the incompleteness issues described above. 

Because the uncertainties in the BOSS stellar masses 
are larger than our bin size, we have done tests to explore 
whether "smearing" of the true stellar mass distribution 
could produce the observed flattening. In Figure 1171 the 
black line reproduces the black dashed line in the right panel 
of Figure 1161 We fit the three low mass data points with 
a linear relation logF»(> 10%) = -3 X logM»/M + 33. 
We randomly generate a sample of 1,000,000 galaxies with 
logM, /Mq has a Gaussian distribution over the range of 
[11.1, 12.1] with a peak at ~11.55 and 68% of them dis- 
tributed over the range 11.37—11.73. We assign a value of 
F t to each galaxy so that the linear relation is reproduced 
on average (red line on the plot). We add an error to each 
stellar mass using the error distribution of CMASS galax- 
ies as a function of M*. This mimic sample has a similar 
distribution in logAf, /Mq as our CMASS galaxies within 
0.54 < z < 0.7. We then recompute the relation between 
logF*(> 10%) and stellar mass and the result is shown as 
the blue line. The main conclusion from Figure [17] is that 
errors will act to flatten the trend, but this is a small ef- 
fect compared to what is seen in the data. Smearing by 
errors also cannot explain the characteristic mass scale of 
log A/* = 11.6 where the flattening appears to set in. 

6.2 AGN and star formation in massive galaxies 

The CMASS galaxies are massive (l ogM*/M Q > W 112 ) 
and predominantly bulge-dominated l|Masters et al.ll201ll ). 
therefore it is likely that they host supermassive black 
holes. In bulge-dominated galaxies where star-formation 
is ongoing, the black holes are usually accreting ac tively 
jHeckman et al.1 12004 IKauffmann fc Heckmanl 120091 ). We 
now turn to examining the black hole accretion rate using 
the [O in]A5007 line as an indicator of the AGN's bolomet- 
ric luminosity. Because the individual spectra are noisy, we 
average them to create high-S/N composites. 

We present stacked spectra of CMASS galaxies with 
F, > X% (X% = 5%, 10% or 15%), logM,/M Q > 11.4, z > 
0.54. The galaxy spectra are first corrected for foreground 
Galac tic attenuation using the dust maps of lSchlegel et al.l 
(1998), transformed from vacuum wavelengths to air, from 
flux densities to luminosity densities, and shifted to the rest 
frame using the redshift determined by the BOSS pipeline. 
The rest-frame spectra are averaged with weight 1/V ma , x 
(Note that the weight of the bad pixels identified in the 
SDSS mask array is set to zero). Finally, we normalize the 
stacked spectra by their mean luminosity in the wavelength 



ranee 4000-4080A. The black, magenta and blue spectra in 
Figure [18] represent the normalized stacks of CMASS galax- 
ies with F t > 15%, Ft > 10%, and F* > 5%, respectively. 
As expected, the [O n] flux increases for larger values of X 
and the spectra are also bluer. 

In the following, we will concentrate on the stack with 
Ft > 10%. In order to quantify how AGN contribute to 
the line emission, we have selected DR7 galaxies with simi- 
lar values of D4000, US A , log([0 m]/H/3), log([0 n]/H/3). 
When we combine the spectra of these "matched" DR7 
galaxies, we find log([N n]/Ha) « -0.5. log([0 m]/H/3) 
is measured directly from the CMASS stack and is ~ 0.2. 
This implies that around half of the [O m] luminosity in 
the stack is contributed by AG Ns (Kauffma nn et all [2003b; 
IKauffmann fc Heckmanl 120091 ). We fit the stacked CMASS 
spectrum as a non-negative linear combination of single 
stellar population models, with d ust attenuation modeled 
as an additional free parameter (|Brinchmann et all |2004| ; 
iTremontTe t al. 20041). This fit yields a continuum V-band 
dust extinction value of 1.76. Taking a mean bolomet- 
ric correction to the extinct ion-corrected [O ill] of 600 
(Kauffmann & Heckmanl [2009I ). and assuming that half of 
the [O ill] emission is coming from the AGNs, we find 
L/L Edd « 0.01, where L E dd = 1.38 x 10 38 A/ B h/M© is the 
Eddington luminosity and Mbh is estimated from the me- 
dian value of the velocity dispersions of th e galaxies that 
go int o the stack using the formula given in lGraham et al] 
l|201lft R 

We have also cross-matched the BOSS and FIRST sur- 
veys, and found that ~2.4%CMASS galaxies have FIRST 
detections. The typical i-band magnitude and mass of this 
radio loud sample are 19.6 and 10 n ' 6 MQ, respectively. For 
this radio-detected sample, we construct a control sample 
matched in redshift, stellar mass, and velocity dispersion 
which are located in the FIRST survey area but lack radio 
detections. Interestingly, the fraction of galaxies with recent 
star formation (Ft > 10%) is 2—2.5 times smaller in the 
radio-loud sub-sample than for the controls. We will study 
the difference between radio-loud and radio-quiet CMASS 
galaxies in more detail in future work. 

The red spectrum in Figure [18] is a stack of DR7 galax- 
ies with Ft < 10%, logM,/M > 11.4 and 0.2 < z < 0.25. 
In order to make a fair comparison between DR7 and 
BOSS "inactive" galaxies, we construct a twin sample from 
CMASS galaxies in the redshift range 0.5 — 0.55, which has 
exactly the same stellar mass distribution and F t < 10%. 
The stacked spectrum of this twin sample is shown in green. 
(Note that the red and green spectra have both been shifted 
down by 0.7 from the other three spectra. The stacks are 
generated in the same way as the "active" star forming 
galaxies except we use an equal weight rather than 1/Vmax). 
There is no apparent difference in the spectral shape and 
absorption line features of the stacks of DR7 and BOSS "in- 
active" galaxies. 

In summary, we conclude that at z ~ 0.6 at least 2% of 



6 iGraham et all ll201ll) suggests a slope of 5 rather than 4 
llTremaine et al.ll2002l; iGraharnllSooj ; iGraham fc Lill2009h for the 
stella r mass — stellar velocity dispersion relation, which is also 
what iHul (2008) found when considering only the massive galax- 
ies. 
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Figure 18. The black, magenta and blue spectra represent the stacks of galaxies with F* > 15%, F, > 10%, and F f > 5% in CMASS 
sample, respectively. The red spectrum is the stack of "inactive" (F* < 10%) DR7 galaxies with logM* /M© > 11.4,0.2 < z < 0.25, the 
green spectrum shows its CMASS twin sample which has exactly the same mass distribution and F* < 10%, 0.5 < z < 0.55. The insert 
drawing highlights the region near the [O n] line. 



very massive galaxies have formed more than 10% of their 
stars in the last Gyr and nearly 10% have formed more 
than 5% of their stars over this period. We note that for 
a galaxy with M* ~ 10 12 Mq to be counted as a member of 
the F* > 10% "class", it must have processed more than 
1O 11 M0 of gas over the last Gyr or so, i.e. 6—8 times as 
much gas as contained in the Milky Way! If this gas is in 
molecular form, it should be easily detectable at z ~ 0.6. 
More detailed studies of these objects will reveal important 
insights into the physical processes t hat govern the evolu- 
tion of massive galaxies at late times. iKavirai et al.l (|2008l ) 
studied recent star formation in massive galaxies at z ~ 0.6 
using rest-frame UV data, and found young stars at levels 
of a few percent by mass fraction. Based on a strong cor- 
respondence between the presence of star formation (traced 
by UV colours) and the pr esence of morphological distur- 
bances, [Kavjra^^l] (|201ll ) suggested the star formation is 
merger-driven. The major merger rate at late epochs (z < 1) 
is predicted to be too low to produce the observed number 
of disturbed LRGs, so the authors invoked minor mergers as 
an alternative machanism. Future kinematic studies of larger 
sample of such galaxies would help test this hypothesis. 
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APPENDIX 

Here we explore the robustness of the SFHs that we infer. 
Our analysis is based on restframe optical wavelengths which 
are contain a large contribution from intermediate age stars; 
therefore our code does best at recovering the SFR averaged 
over the last Gyr. In contrast, commonly used SFR tracers 
such as Ha, the far-ultraviolet, and the far-infr ared trace 
star formation on timescales of ~10— 100 Myr ( Ke nnicuttl 
1 1998ft . 

The recovery of galaxy star formation histories from in- 
tegrated spectra is a difficult problem due to well-known 
degeneracies between age, metallicity, and dust attenuation. 
We explore the effect of these degeneracies on our SFR esti- 
mates in two ways: a) We generate synthetic data where the 
input parameters are well known and test the ability of our 
PCA-based algorithm to recover the true SFR. b) We use the 
real data to test whether our PCA-based SFR estimates give 
answers that are consistent with SFR derived from nebular 
emission lines. In the real universe there are strong correla- 
tions betwee n SFR, stellar mass, metallicity, and dust atten- 
uation (c.f., Brinchmann et alj 20041; iTremonti et ail |2004| ; 



iGallazzi et al.ll2005l ; lAsari et al.ll2007h . To ensure that our 
suite of test models reflected these parameter correlations, 
we used DR7 galaxies to define the input parameters of our 
model. We randomly selected 1000 SDSS DR7 star form- 
ing galaxies and tabulated their M*/L ratio, U-band dust 
attenuation of young stars (tv = Ay/1.086), nebular metal- 
licity, and SFR/M, from the MPA/JHU catalog. (The dust 
attenuation, met allicity, and SFR have be en estimated from 
the nebular lines (|Brinchmann et ai"1l2004l ). We assume that 
the stellar metallicity, Z*, is 0.4 dex lower than the nebu- 
lar metallicity, as found by Gallazzi et al. 2005.) For each 
galaxy we identified all the models in our library that were 
within ±0.1 dex in log(M*/L), log(SFR/M„), Z», and tv, 
and we randomly selected 5 models from this subset. We 
used the error array of the SDSS spectrum to add realis- 
tic random errors to each of the model spectra. We then 
applied our PCA analysis to the ~5000 simulated spectra 
and estimated F*. In the top panel of Figure[l9]we compare 
the input and output values of logF* as a function of stellar 
mass and dust attenuation for our simulated DR7 galaxies. 
In the lower panel we show the result of a similar exercise 



Figure 19. The difference between our input, _F t (in), and out- 
put, F*(out) in tests using simulated data. As described in §6, 
our suite of 5000 test spectra have values of SFR, M», Z t , and 
Ty drawn from DR7 star forming galaxies. The top panels and 
bottom panels show simulated DR7 and BOSS data while the left 
panels and right panels shows the logF* residuals versus stellar 
mass and V-band dust attenuation. The red solid line denotes the 
median and the green dashed lines enclose the 68% of the data 
points. A black error bar denotes the median ±lc error of the 
PCA-derived parameters. There is good agreement between the 
derived errors and the scatter in the input and output parame- 
ters, and only weak evidence for a systematic trend. This suggests 
that the PCA technique is relatively robust against degeneracies 
between age, dust, and metallicity and able to accurately recover 
the SFR in the last Gyr when the data is well represented by the 
model grid. 



where we have substituted the error arrays of randomly se- 
lected BOSS galaxies to explore our ability to recover F* at 
the low S/N typical of BOSS. For the simulated DR7 and 
BOSS spectra, we recover F* to within ±0.1 and ±0.2 dex 
respectively. There is as a small systematic trend with dust 
attenuation that is evident in the noisier BOSS data, but 
this produces only a very weak systematic trend with stel- 
lar mass (less than 0.05 dex over two orders of magnitude 
in stellar mass). Thus, our PCA technique appears to accu- 
rately recover the SFR in the last Gry in the case where the 
input data is well represented by the models. 

The next question is whether our choice of priors in- 
fluences the derived value of F*. We have done a variety 
of tests similar to those outlined in §5 and find that our 
derived SFHs are generally insensitive to changes in our 
input model grid. Not surprisingly, the parameter that is 
most important is the fraction of of galaxies with bursts 
in the input model library. In Figure 1201 we plot AF* — 
F„(PCA) - K,(PCA, 50%) as a function of F„(PCA), where 
F*(PCA) is our estimate of the fraction of young stars 
formed in the last Gyr using the fiducial model library and 
.F* (PCA, 50%) is the fraction estimated using the library 
with a 50% burst fraction. Although the difference in the 
two estimates is an increasing function of F*(PCA), in per- 
centage terms the two estimates give results that differ very 
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Figure 20. AF* = F*(PCA) - F*(PCA, 50%) as a function of 
-F„(PCA), where F*(PCA) is the fraction of stars formed in the 
last Gyr using the fiducial library, and .F„(PCA, 50%) is the frac- 
tion of stars formed in the last Gyr using the library with 50% 
burst fraction. The red lines denote the median value of AF*, and 
the two green dashed lines show the 68% spread in AF*. 



little. Moreover, the bulk of our galaxies have very small 
values of F t where the difference is negligible. 

Finally, for the DR7 star-forming galaxies, we directly 
compared the current SFR inferred from the nebular emis- 
sion lines jBrinchmann et al] 120041 ') to our PCA-based es- 
timates of the average SFR in the last Gry (SFR = F* X 
M»/10 9 yr). The scatter at fixed stellar mass is 0.2 dex. Part 
of this may stem from real differences in the SFR over the 
timescales probed (10 Myr vs. 1 Gyr). Curiously, we find sys- 
tematic trends with stellar mass and dust attenuation that 
are not present in our tests with simulated data (Tig. I19[) . 
For the real data, the PCA technique appears to underesti- 
mate the SFR inferred from extinction corrected Ha by as 
much as 0.4 dex in the most massive (log(M*/MQ) > 10.8) 
or dusty (tv > 3) galaxies. A systematic trend of this na- 
ture is difficult to explain by invoking star formation his- 
tory differences as this would imply that all massive galaxies 
are in the midst of a burst at the current epoch. We note 
that similar discrepancies have been found in other works 
that compare information inferred from the restframe op- 
tical continuum and the nebular lines (c.f., Tanakal l201ll ; 
iHoversten fc Glazebrook]|2008l ; iGunawardhana et al.ll201lT ). 
This suggests that commonly adopted model assumptions 
regarding dust attenuation or the initial mass function may 
be too simplistic. For instance, there is some evidence for 
a dust component associated with intermediate age stars 
(Eminian et al. 2008). Dust attenuation is also likely to be 
inhomogeneous within a given galaxy, and the net effect may 
not be well approximated by our two 'effective' global pa- 
rameters, fi and tv. The importance of dust inhomogeneities 
has been demonstrated in a study of 9 loc al galaxies using 
multiband photometry (jZibetti et al.l [20091 ). Further explo- 
ration of the differences between spatially resolved and un- 
resolved SFR and mass estimates will be possible with the 
next generation of integral field unit galaxy surveys (c.f., 
ISanchez et al.|[201ll ). We defer a full analysis of the differ- 
ence between PCA and nebular estimates of the SFR to 
future work. In §7 we will compare PCA-derived F* values 
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